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1 Introduction 

Higher order spatial statistics characterize non-Gaussian aspects of random 
fields, which are ubiquitous in cosmology: from the Cosmic Microwave Back- 
ground (CMB) to the large scale structure (LSS) of the universe. These ran- 
dom fields arc rich in their properties; they can be continuous or discrete,; 
can have one through three, or even more dimensions; their degree of non- 
Gaussianity ranges from tiny to significant. Yet, there are several techniques 
and ideas, which are applicable to virtually all cosmological random fields, 
be it Lyman-a forests, LSS, or CMB. 

In this lecture notes I concentrate on the classic and widely applicable 
characterization of higher order statistics by joint moments, a.k.a. higher or- 
der correlation functions, and directly related statistics. These statistics are 
very powerful, although have a perturbative nature to them in that they con- 
stitute an infinite (formal) expansion. Clearly, when only the first N terms of 
this expansion are extracted from data, interesting information might remain 
in even higher order terms. This is why a host of alternative statistics (void 
probabilities, wavelets, Minkowski- functionals, minimal spanning trees, phase 
correlations, etc. just to name a few) are introduced and used extensively in 
the literature, in complementary fashion to iV-point functions. More on these 
alternatives appear in other lecture notes of this volume, and if your appetite 
is whet you should read e.g., pQ for more information. The present lecture 
notes serve as an informal introduction to the subject, a starting point for 
further studies, rather than a full-blown review. 

Higher order statistics are complicated due to several factors. Most of 
them arise from the large configuration space of parameters which results 
in a "combinatorial explosion" of terms. For instance, the first non-trivial 
3-point correlation function can in principle depend on 9 coordinates. Taking 
into account applicable symmetries still leaves 3 parameters in real space, and 
6 parameters in redshift space (5 if one uses the distant observer approxima- 
tion) . This dependence on a large number of parameters renders higher order 
statistics quite cumbersome, their measurement, interpretation and visualiza- 
tion surprisingly complex and CPU intensive. For the same reason, theoretical 
prediction of higher order statistics correspondingly involved, non-linearities, 
redshift distortions or projection effects, and bias affect them in subtle and 
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non-trivial ways. In addition, higher order statistics are more sensitive to 
geometry and systematics then their low order cousin, the two-point corre- 
lation function. This means that a comparatively larger and cleaner sample 
is needed for effective studies. Even with the presently available large sur- 
veys, such as SDSS and 2dF, the overlap between well understood theory and 
reliable measurement is in fact disquietingly small. 

Despite the above difficulties, the study of higher order statistics is a re- 
warding one, both intellectually and scientifically. The simple idea that small 
initial fluctuations grew by gravitational amplification is enough to predict 
higher order correlation functions, at least on large scales, to impressive ac- 
curacy. Analytical predictions and simulations contrasted with data have al- 
ready provided a strong support for Gaussian initial conditions, and show 
that our basic picture of structure formation is correct. 

On the other hand, the two-point correlation function alone gives a quite 
simplistic view of LSS, which has been visually demonstrated first by Alex 
Szalay, through randomizing the phases of a highly non-Gaussian simulation. 
This procedure erases a large portion of (altough not all) higher order cor- 
relations, while keeping the two-point statistics (c.f. Fig. 1. in Peter Coles' 
lecture notes in the same volume). It is striking how most structure we see 
in the simulation disappears when higher order information is erased, despite 
that the two-point function (or power spectrum) is the same. 

In addition, the two-point correlation function is degenerate with respect 
to the bias (the difference between the statistics of the observed "light" from 
galaxies and the underlying dark matter), and the amplitude of primordial 
fluctuations. Most methods which extract information from the two-point 
correlation function or power spectrum alone, draw on assumptions about 
the bias to avoid this degeneracy problem, and/or combine with CMB mea- 
surements. Higher order statistics not only yields a more accurate picture 
of the distribution, but resolves the degeneracy of the two-point function by 
constraining the bias. 

These notes focus on phenomenological aspects of higher order correlation 
functions and related statistics. In particular, I emphasize the relationship of 
these statistics with symmetries. Some aspects of this are quite new, and it 
is the only way I know how to ease the pain of the above mentioned "combi- 
natorial explosion" , at least partially. In the next sections I review the most 
important theoretical and statistical information, in particular I present the 
definitions of the statistics, estimators, how errors are calculated, algorithms 
to realize the estimators, and some points on bias, redshift distortions. I de- 
velop symmetry considerations in slightly more detail throughout. For com- 
pleteness, I include the foundations of perturbation theory; a detailed recent 
review is available by 2 which is highly recommended. Finally, I illustrate 
most of the above through an example by working out the theory of condi- 
tional cumulants. Finally, while LSS is emphasized in examples, most of the 
theory is directly applicable to CMB, or any other random field as well. 
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2 Basic Definitions 

The learning curve of higher order statistics is initially steep, partly due to 
the large number of new concepts and definitions which need to be absorbed. 
Here I attempt to collect some of the most basic definitions which will be 
used later. 

A Spatial Random Field A is field which has random values at all spatial 
points. Its corresponding measure is a functional, V(A)DA, the "probability" 
of a realization. This is analogous to a probability density. Averages can be 
calculated via functional integrals, which although not always well defined 
mathematically, are very pictorial. 

Ensemble Average We denote with (A) the ensemble average of some 
quantity A. The averaging corresponds to a functional integral over the den- 
sity measure. Physically this means simply the average of the quantity over 
infinite (or many) independent realizations. 

Ergodicity In the study of random fields it is often assumed that ensemble 
averages can be replaced with spatial averages. Physically this means that 
distant parts of the random field are uncorrelated, and thus a large realization 
of the random field (a "representative volume", hopefully the chunk of our 
universe which is observable) can be used for measurements of statistical 
quantities. 

Joint Moments The most basic set of statistics we can consider to charac- 
terize a random field T are the joint moments of the field taken at N distinct 
points xi, . . -Xn, 



We often work with fluctuation fields which have been normalized with the 
average: S = — 1. Note that the spatial dependence on coordinate Xi is 
often abbreviated as Si. No assumption is made on the dimensionality of the 
random field. One (Lyman alpha forest), two (CMB, LSS projected catalogs), 
or three (LSS redshift surveys) dimensional fields are typical in cosmology. 

Connected Moments Connected moments are denoted with () c and are 
defined recursively with the following formula: 



(Si,..., 6 N ) C = {Si, . .. , S N ) - 5i) c . . . (Sj . . . 5k) c ■ ■ ■ , (2) 



where P denotes symbolically a summation of all possible partitions. In an- 
other words, connected moment of all possible partitions have to be sub- 
tracted from the full (or disconnected) joint moment. What is left is the 
connected moment. 

N -point correlation functions are defined as the connected joint moments 
of the normalized fluctuation field 5 



F { - N \xi,...,x N ) = (T(xi),...,Tx N ) 



(1) 



p 



^ N \l,...,N) = (Si,...,S N ) 



C) 



(3) 
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where we introduce yet another short hand for denoting £ N (xi, . . . ,Xn). The 
three-point correlation function is often denoted with £ instead of £*- 3 \ and 
the two-point correlation function with £ = £( 2 ) . Most statistics we are dealing 
with in this review are special cases of TV-point correlation functions. 

Gaussian A random field is called Gaussian if its first and second moments 
fully determine it statistically. We will often use the field S, which denotes a 
random field whose average is 0. Then a Gaussian field is fully determined 
by its two-point correlation function £ = (6162)- Gaussian random fields have 
trivial higher order moments in that all their connected moments vanish for 
N > 3 (Wick-theorem). 

Non- Gaussian A non-Gaussian field has at least one non- vanishing higher 
than second connected moment. Note that the definition of non-Gaussian 
is highly general (like "non-Elephant") everything what is not Gaussian is 
non-Gaussian. Therein lies one of the highest difficulty of the subject. 

Cumulants correspond to the simplest special case of TV-point functions: 
the joint moment of a degenerate configuration, where all the field values are 
taken at the same place. The usual definition in cosmology is 



(5 2 



Sn = /«\JV-1' ( 4 ) 



where the normalization with the average correlation function £ = (<5 2 ) is mo- 
tivated by the dynamics of perturbation theory. Other normalizations might 
be more suitable, e.g., for CMB. Cumulants depend only on one scale: the 
smoothing kernel of 5 which implicit in the above equation. This fact ac- 
counts for most of their popularity, and indeed they are the only statistics 
(along with cumulant correlators), which have been measured as high order 
as N — 10 for galaxies. An alternative definition is Qm — Sn /N n ~ 2 , which 
corresponds to the division with the number of possible trees graphs. This 
definition typically ensures that for galaxies and dark matter all Qjv's are of 
order unity. 

Cumulant Correlators correspond to the next simplest special case of N + 
M-point functions: the joint moment of a doubly degenerate configuration, 
where the N and M field values are taken at two places, respectively. The 
usual definition in cosmology is 

(£Ns:M\ 

(5 2 } N + N - 1 (S 1 S 2 )N N - 2 M M - 2 ' [ ' 

Other generalizations, e.g. with triple degeneracy, are obvious, but not clear 
whether fruitful. In the above, it is assumed that the smoothing kernel is 
the same for both points, which is usual, although not necessary. When the 
normalization of the possible number of trees is not done, the quantity is 
often denoted with Cn,m- 

Conditional Cumulants are defined as the joint connected moment of one 
unsmoothed and N—l smoothed density fluctuation fields. They can be real- 
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ized by integrals of the TV-point correlation function through N — 1 spherical 
top- hat windows, 



U N (Ri,...,R N - 1 )= / U»i ^-i.O) II d3x * v- (6) 



where Vi — 47r/3i?f is the volume of the window function Wr^. In the most 
general case, each top hat window might have a different radius. The condi- 
tional cumulants are analogous to ordinary cumulants, but subtly different: 
they have an advantageous smoothing kernel which enables their measure- 
ment with an edge corrected estimator (more on this later). 

Power Spectrum Since space is assumed to be statistically invariant under 
translations, it is convenient to to introduce the Fourier transform of the 
D-dimensional field 



The translation invariance causes the independent fc-modes to be uncorre- 
cted, which renders the correlation function of the Fourier modes "diagonal" 



which is the definition of the power spectrum. (S D is the Dirac-5 function.) It 
can be shown that the power spectrum is the Fourier transform of the two- 
point correlation function £. Note that the (2n) D normalization is customary, 
but it is not followed by everybody. 

Poly-Spectra Similarly to the power spectrum, the joint connected mo- 
ments of N- Fourier modes are called a poly-spectra (N — 1-spectrum) modulo 
a Dirac-<5 function. The most important special case is the bispectrum, the 
Fourier transform of the three-point correlation function £ = The next 
order is called tri-spectrum. 

A Discrete Random Field corresponds to all possible arrangements of dis- 
crete objects (whose number can also vary) with the corresponding measure. 
A generalization over the continuous random field which is needed to describe 
the statistics of the distribution objects in space, such as galaxies. 

Poisson Sampling Often it is a good approximation to derive a discrete 
random field from a continuous one via a simple assumption: take an infinites- 
imally small volume, such that the probability of having more than one object 
in the volume is negligible, and simply assume that the probability of having 
one object is proportional to the value of the field within the small volume. 
This is called an infinitesimal Poisson process. 

Discreteness Effects correspond to the difference between the statistics of 
an observed, discrete random field and an assumed underlying, continuous 
random field. Also known as shot noise, or Poisson noise. It can be especially 
simply calculated under the assumption of infinitesimal Poisson sampling. 



N-l 




(7) 



{S kl 6 k2 )=6 D (k 1 +k 2 )(2TT) u P(k), 



(8) 
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3 Estimators 

The main idea underlying the construction of estimators for spatial statis- 
tics is the ergodic theorem. If we replace the ensemble averages with spatial 
averages, we obtain estimators for the above defined quantities. 

To put the above simple idea into practice, one has to deal with some 
subtleties as well. I list a few interesting ones: discreteness, counts in cells, 
edge correction, and optimality. 

3.1 Discreteness 

Imagine a galaxy catalog with average count N in a given cell of size R, our 
chosen smoothing length. We can estimate approximately the density at each 
cell as Si ~ ^ — 1. The above simple idea suggest to estimate the K-th order 
cumulant with (for a quantity A we denote its estimator A throughout). 

JVtot — 

A little more thought, however, reveals that, while our estimator for den- 
sity field is unbiased, our estimator for the cumulant is not. While (N) = 
N(l +5) = N, (N K ) ^ N K ((1 + 5) K ). The reason for this is the "self con- 
tribution" to correlations, a typical discreteness effect. 

To see this, let's follow 0, and imagine for K = 2 that our cell of size R is 
divided into T infinitesimally small cells, each of them small enough that the 
probability of having more than one galaxy in it would be negligible. Let's 
call the number of galaxies in each tiny cells /it,. Since /ij = 0, 1, /i^ = /Zj for 
any K. This means that all moments (nf) = (fii) = N/T. The total number 
of objects in our original cell is Mi- Now it is easy to see that 

<^ 2 > = E = Z>> + X>iX^-}(i + (io) 

i i=£j 

With T — » oo the above expression yields the final results 

(N 2 ) = _N+_N 2 (l + £) 

= N + N 2 ((1+S) 2 ). (11) 

The first term corresponds to the Poisson-noise bias of our estimator. Note 
that we already simplified the case assuming that N and all the normalization 
required to obtain Sk from the above is known a priori. 

Higher orders can be calculated either with the above suggestive, but 
tedious method, or with generating functions one can prove the following 
simple rule 0] 

((N) K )=N K ((1 + S) K ), (12) 
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where {(N) K ) = (N(N - 1) • • • (N - K + 1)) are the factorial moments of N 
(the quantity inside the average is called falling factorial). In other words, 
if you replace regular moments with factorial moments, discreteness effects 
magically disappear. With this simple rule, and taking into account that 
discreteness can only affect overlap, you can construct estimators free from 
discreteness biases for any of our defined estimators. E.g. 



and all you have to do is express algebraically S3 as a function of the factorial 
moments. It is a good exercise, for solutions see [5]- 

Note that both intuition and halo models jHj suggest that Poisson sam- 
pling might not be a good approximation on very small scales for galaxy 
surveys. When in doubt (i.e. no well defined and trusted model for the dis- 
creteness exist), try to use estimators without self-overlap, since discreteness 
manifests itself through self-correlations only. 

3.2 Counts in Cells 

We have shown in the previous subsection how to construct estimators for 
cumulants, cumulant correlators, etc. even in the discrete case. For cumulants, 
however, a more efficient method exists based on counts in cells. 

According to the previous subsection, if we can estimate factorial moments 
(N)k, it is a simple matter of calculation to obtain cumulants. While one 
could directly estimate the factorial moments from data, the usual procedure 
is to estimate first the probability P/v that a cell of a given volume (or area) 
contains iV objects. Then the factorial moments can be calculated as 



The advantage of this technique is that there exist many fast and efficient 
algorithms to estimate Pm from data, and once the counts in cells proba- 
bilities are estimated, arbitrary high order cumulants can be obtained in a 
straightforward manner. 

3.3 Edge Correction and Heuristic Weights 

If signal and noise were uniform, uniform weighting of objects would be op- 
timal as required by symmetry. On scales approaching the characteristic size 
of a survey, uneven (or more generally suboptimal) weighting of objects near 
the edges of the survey will often dominate the errors. The optimal weight 
has contributions from survey geometry and correlations (signal and noise), 



<(Af) 2 >= iV 2 (l + 

<(JV) 3 ) = 7V 3 ((l + 3<5 + 3<5 2 + <5 3 ) 

= ^ 3 (l + 3£ + 3S 3 P) 



(13) 




(14) 



N>Q 
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and it is not even diagonal. These two effects are only separate conceptually, 
but they are often coupled in non-trivial ways. 

Edge correction with heuristic weights fully accounts for the effect of sur- 
vey geometry (including uneven sampling, etc), however, it does not include 
a prior on the correlations. For instance, the heuristic weights used in the 
SpICE method |5 are exact in the noise dominated regime, however, they 
are only approximate when the signal is important. 

A class of edge corrected estimators ^3 can be written symbolically 

as 

i N = w ilt ... tiN (5 il ...d iN ) c (15) 

ii,...,i N 

which is formally the same as the quantity we want to measure, except we 
replace () with an average over the sample itself according to the ergodic 
principle. The heuristic weights are such that their total sum is 1, and usu- 
ally correspond to an inverse variance. This would be exact if the bins were 
uncorrelated, which is usually not the case. It can be a good approximation 
power spectra as shown by 8 (c.f. lecture notes of Andrew Hamilton in this 
volume) . 

Note that above estimator introduce additional subtleties due to taking 
of connected moments, and due to the non-linear nature of the estimator. 
The non-linearity is introduced by calculating the fluctuations via a division 
with the average density. If we estimate the average density from the same 
sample, the bias due to this effect is termed the "integral constraint" . 

In practice, the above estimator is often replaced with a Monte Carlo 
version (jH) for the two-point correlation function, and ^13^2 f° r the iV-point 
functions). Let D, and R denote the data set, and a random set, respectively. 
The latter is a random distribution over the geometry and sampling variations 
in the survey. If we define symbolically an estimator D p R q , with p + q = N 
for a function <P symmetric in its arguments 

DPR" = J2^u---^ p ,yi,...,y q ), (16) 

with Xi Xj 6 / yj £ R. As an example, the two point correlation 

function corresponds to <P(x,y) = [x,y <E D,r < d(x,y) < r + dr], where 
d(x, y) is the distance between the two points, and [condition] equals 1 when 
condition holds, otherwise. 

The estimator for the ./V-point correlation functions is written with the 
above symbolic notation as (D — R) N , or more precisely, 

i ^ ^ 

where S = J <P/j,n, a simple phase space integral, and A, p are the densities of 
data and random sets, (for details see 55 )• This is the Monte Carlo version 
of the edge corrected estimator. 
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It is worthwhile to note is that counts in cells cannot be properly edge 
corrected, although approximate procedures exist in the regime where the 
probabilities have a weak dependence on the cell shape, or one could integrate 
the above estimators numerically. 

3.4 Optimality 

For the measurement of the two-point function for a Gaussian process we 
know that maximum likelihood methods are optimal in the sense that no 
other method can achieve smaller variance. This can be shown with the use of 
the Fisher matrix and the Cramer Rao inequality ^1ED ( see a ls° the lecture 
notes of Andrew Hamilton in this volume) Since our previous prescription for 
edge effects is different (in particular it does not know about correlations), 
it is clearly suboptimal. Given that we know the optimal method, why don't 
we use it for all measurements? The reason is that there are caveats, which, 
especially for galaxy surveys, severely limit the applicability (and optimality) 
of the optimal method. 

Computational issues: The quadratic incarnation of the maximum like- 
lihood method for two-point correlation function (or power spectrum) 
amounts to sandwiching the projection matrix Pi = dC/dl, the deriva- 
tive of the full (signal plus noise) correlation matrix according to the I — th 
parameter C'i between the data vectors weighted by the inverse correlation 
matrix C , yielding 

Ci =< C~ 1 x\P i \C~ 1 x >, (18) 

where Ci is the estimator for Cj. We can see from this that the inverse 
of the correlation matrix needs to be calculated, which typically scales 
as 0(N 3 ) with the number of data items 1 . For N ~ 1000 this can be 
done on a laptop computer (about 17 sec on mine), but for iV ;> 10 6 it 
becomes prohibitive even for the fastest supercomputers (not to mention 
storage capacity, which might also become unrealistic for a large data 
set). Iterative, and brute force maximum likelihood methods, although 
seemingly more complicated, scale the same way. 

State of the art data sets ("mega-pixel" CMB maps, such as WMAP 
and Planck, as well as decent LSS surveys) are way beyond present day 
computational capabilities (unless some special tricks and approximations 
are involved, e.g..|14|). 

Caveats: the "lossless and optimal" property of the maximum likelihood 
estimator is subject to practical limitations. Data reduction and analy- 
sis involve many subtle choices, such as systematic corrections, binning, 

1 There are methods, such as preconditioned conjugate gradients, with which 
C~ 1 x can be calculated in 0(N 2 ). However, the Fisher matrix still needs 0(iV 3 ) 
computations 
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pixellization, etc. In the past these turned into differences between anal- 
yses which all used the same optimal method. In addition, non-iterative 
quadratic maximum likelihood estimators are posteriors based on a prior, 
thus they are only optimal in the limit when we have full prior knowledge 
of what we want to measure. 

The Gaussianity condition: while it is an excellent approximation for 
the CMB, it breaks down for the LSS even on very large scales. In the 
Bayesian sense, the Gaussian prior is not justified, thus the estimators 
might become sub-optimal, and might even be biased. Those measure- 
ments, where variants of the maximum likelihood are method are used 
for LSS, take great pain at controlling "non-linear contamination", e.g. 
with filtering. Besides the computational aspects, this is the main rea- 
son why many LSS analyses still use heuristic weights. However, some 
aspects of the maximum likelihood method have been adapted to deal 
with non-Gaussianities \R>\, an d even for estimating three-point correla- 
tion functions under the assumption of Gaussianity [161 117| (this is not a 
contradiction: when the non-Gaussianity is tiny, as in CMB, correlation 
matrices can be well aproximated with a Gaussian assumption). 

4 Errors 

The formal calculation of errors (or more generally covariance matrices) is 
straightforward from Eauation ll5l one simply takes the square of the equation 
and performs ensemble averages. In general, for two bins represented by the 
weights w a and w b , the errors will scale as 

ii,...,ijv 

In practice, complications will arise from i) the large number of terms ii) the 
complicated summation (or integral in the continuum limit) iii) the overlaps 
between indices and the corresponding discreteness terms iv) the theoretical 
expression of the ensemble averages. However, we can draw the general idea 
from the expression that the covariance matrix of iV-point estimators depends 
on 27V-point correlation functions. Next we demonstrate this formula with 
the calculation of the errors on the two-point correlation function. 

To specify more our symbolic estimator, let us assume that the survey is 
divided into K pixels, each of them with fluctuations with i running from 
1 ... K . For this configuration our estimator can be written as 

i = w 12 S 1 6 2 . (20) 

The above equation uses a "shorthand" Einstein convention: 1, 2 substituting 
for i\,i2, and repeated indices summed, and it is assumed that the two indices 
cannot overlap. 
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The ensemble average of the above estimator is clearly 1012^12- The con- 
tinuum limit (co)variance between bins a and b can be calculated by taking 
the square of the above, and taking the ensemble average. 

{SC5£») = w a 12 w b 3i «5iW4> - . (21) 

Note that the averages in this formula are not connected moments, which are 
distinguished by () c . 

The above equation yields only the continuum limit terms. To add Poisson 
noise contribution to the error, note that it arises from the possible overlaps 
between the indices (indices between two pairweights can still overlap!). In 
the spirit of infinitesimal Poisson models, we replace each overlap with a 1/A 
term, and express the results in terms of connected moments. There are three 
possibilities, i) no overlap (continuum limit) 

<2™34 (6234 + £l3&4 + 6463) , (22) 

ii) one overlap (4 possibilities) 

jw a 12 w b 13 (£123 + 63) , (23) 

iii) two overlaps (2 possibilities) 

I^Xaa + £12), (24) 

In these equations, for the sake of the Einstein convention we used k, I) — 
£,ijki ■ The above substitutions (rigorously true only in the infinitesimal Pois- 
son sampling limit) become increasingly accurate with decreasing cell size. 
For the Monte Carlo estimator of Eq. El a slightly more general formula is 
valid, where the summation is replaced with integrals over a bin function 
^(1,2) 

= js{J ^(1,2)^(3,4)^(1, 2, 3, 4) + 2^(1,3)^(2, 4)] + 

j J <P (1, 2)# 6 (1, 3) K(2, 3) + 6(1,2, 3)] + 

A J * (l,2)$6(l,2)[l+£(l,2)]}. (25) 

For completeness, we present the result for the three-point correlation func- 
tion as well 

$3^> =^{/ ^(1,2,3)^(4, 5, 6) [£(1,2, 3, 4, 5, 6) + 

3£(1, 2)£(3, 4, 5, 6) + 9£(1, 4)£(2, 3, 5, 6) + 
3£(4, 5)£(1, 2, 3, 6) + 9£(1, 5, 6)£(2, 3, 4) + 
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9£(1, 4)£(2, 3)^(5, 6) + 6£(1, 4)£(2, 5)£(3, 6)] 
^ / # (l,2,3)# 6 (l,4,5)fe(l,2,3,4,5) + 



C(2, 3, 4, 5) + 2£(1, 2)£(3, 4, 5) + 
2£(1,4)£(2,3,5) + £(2,3)£(1,4,5) + 
4£(2,5)£(1,3,4)+£(4,5)£(1,2,3) + 
£(2,3)£(4,5) + 2£(2,4)£(3,5)] + 

/ * (1,2,3)JP 6 (1 J 2,4)[£(1,2,3,4) + 



In the above formula, £ with iV variables is an iV-point correlation function, 
<&a is a bin function corresponding to a triangle (it's value is one, when the 
triplet is in the bin, otherwise). 

The above integral is fairly complicated to calculate in practice, usually 
some approximations are necessary. In series of papers [181 |SJ El has worked 
out practical approximations for up to N = 4 for the moments of CIC, 
obtained (lengthy) semi-analytical formulae, and checked validity of those 
approximations against simulations. The resulting code (FOrtran for Cosmic 
Errors, FORCE is available publicly. 2 The errors of CIC have the special 
property that they contain terms cx 1/C, where C is the number of cells 
used for the CIC estimation (measurement errors). This is the motivation for 
algorithms with have C = oo. 

From the above error calculations an intuitive physical picture has emerged. 
We can distinguish three classes of errors, often approximately separated in 
the final expressions: finite volume errors (the term cosmic variance is used in 
CMB literature), corresponding to the the fact that the universe might have 
fluctuations on scales larger than the survey size (the smallness of such fluc- 
tuations often termed qualitatively as "representative volume" ) ; discreteness 
errors: arising from the fact that we are sampling the dark matter distribu- 
tion (presumably continuous) at finite points with galaxies; edge effect errors, 
arising from the uneven weights given to galaxies during the estimation. For 
typical galaxy surveys, edge effects dominate on large scales, and discreteness 
on small scales. Finite volume effects change slower, and they might domi- 
nate in the transition region on intermediate scales. In the above example for 
the two-point function, the first term contains mainly finite volume effects 
(the integral of the 4-point function), and terms with powers of 1/A are dis- 
creteness effects. Edge effects are due to the complicated weight summation 
blended with the other effects. 

2 http: //www. ifa.hawaii.edu/users/szapudi/force. html 
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These qualitative observations are valid for CMB as well with some 
caveats: i) the CMB is Gaussian to good approximation, therefore typically 
only two point functions need to be taken into account ii) the CMB is contin- 
uous, i.e. there are no discreteness effects, hi) instead, there are terms arising 
from the noise of the detection with similar role to discreteness (noise) in 
LSS. Note that for high precision cosmology applications constraining cos- 
mological parameters, "error on the error" (or uncertainty of the uncertainty 
as sometimes called) is as important as the size of the error. 

5 Symmetry Considerations 

A class of functions subject to (Lie-group) symmetries is best represented 
as an expansion over irreducible representations. In Euclidean (flat) space 
translation invariance is the most obvious symmetry. The appropriate trans- 
form is the Fourier transform, and homogeneity can be taken into account 
as a Dirac delta function in transform space. The customary definition of 
the power spectrum and bispectrum are with the Fourier transform of the 
random fluctuation held 6(k) as 

(6(ki)S(k 2 )) = (2n) D S D (k 1 + k 2 )P(h) 

(S(k 1 )5{k 2 )6(k 3 )) = {2^) D 5 D {k l +k 2 + k 3 )B(k u k 2 , k 3 ), (27) 

where 5d is a Dirac delta function, and D is the spatial dimension. Thus, be- 
cause of homogeneity, the two-point function becomes the Fourier transform 
of the power spectrum 

£(zi,z 2 ) = J -^nP(kV k{xi - X2 \ (28) 
and a similar equation is true for the three-point correlation function 

Z(x 1 ,x 2 ,x 3 ) = J ^t 1 ^pS(fci,fc2,fc 3 )e 4(feia;i+fc2a:2+fc3 ^ ) fe(fc 1 + fc 2 + fc3). 

(29) 

From these equations, the two-point correlation function is only a function 
of the difference of its two arguments. If the statistical properties of the 
underlying held are isotropic as well, these equations can be further simplified. 
We quote the results for two and three spatial dimensions: 

£(»0 = J ^P(k)J (kr) 2D, 

e(r)=/^P(*)io(*r)3D, (30) 

where Jo and jo are ordinary and spherical Bessel functions, respectively. 

As a consequence of spherical symmetry the three-point correlation func- 
tion and the bispectrum, depend only on the shape of a triangle. |19j has 
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observed that the three-point statistics can be expressed with two unit vec- 
tors, thus zero angular momentum bipolar expansion is suitable in 3 spatial 
dimensions under SO (3) symmetry. Zero angular momentum bipolar func- 
tions are proportional to the Legendre polynomials, thus in turn this becomes 
multi-pole expansion of the bispectrum. If we parameterize the bispectrum 
as depending on two vectors, and the angle between, this can be written as 

B(kx,k 2 ,9) Bi(ki,k 2 )Pi(cos9)^—, (31) 

i 

with an entirely similar expansion for the three-point correlation function. It 
is then simple matter to show ^5] that the multi-poles of the bispectrum are 
related to the multi-poles of the three-point correlation function via a double 
Hankel transform 

tf(ri,ra) = J ^dki^dk 2 (-l) l B l (k 1 ,k 2 )ji(kin)ji(k 2 r2). (32) 

-B;'s and £/ are analogous to the C/'s of the angular power spectrum, they 
correspond to an angular power spectrum of a shells of three-point statistics. 

In two dimensions, the situation is entirely analogous: the symmetry is 
just U(l) rotations on a ring, thus the correct expansion if Fourier (actually 
cosine) transform 

B(ki,k 2 ,6) = Bo{kl 2 k2) +J2B n (k u k 2 )co S (n9), (33) 

n<0 

with analogous expansion for the three-point correlation functions. Note that 
the sin modes are killed by parity. This expansion is generally applicable 
for any three-point statistics in isotropic 2-dimensional spaces. This in turn 
renders the connection of the Fourier coefficients of the three-point correlation 
function and bispectrum as 

/T 7 
^dfci^dfc a (-l) n fl„(fci, fc 2 )J n (fc 1 n)J n (A: 2 r 2 ). (34) 

This latter equation, derived for two-dimensional Euclidean space, is also 
applicable to the spherical bispectrum in the flat sky approximation. This 
provides surprisingly accurate results, even for relatively law multi-poles. A 
form equivalent to the expansion of Eq. 1341 was proposed independently by 
|2U) as well, for the (redshift space) projected bispectrum; this is yet another 
application of the flat sky approximation. 

6 Algorithms 

Estimation higher order statistics is a daunting computational problem. Naive 
solutions are typically ranging from the inefficient (for two-point correlations 
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and counts in cells) to the impossible, hence the application of advanced 
computer science is a must. 

For counts in cells, the naive algorithms scales as MNq where is preferably 
Nc J> 10 9-12 , the number of cells used for the estimation. 

Naive estimation estimation of TV-point quantities typically scales M N 
where M is the number of data points. For modern data sets M ;> 10 6 ~ 10 , 
this becomes prohibitive for higher than second order correlation functions. 
This is a fast developing field. Next we present a set of algorithms to illustrate 
the problems and typical solutions. 



6.1 iV-point Correlation Functions 
Hierarchical Algorithms 




3 4 



1 2 3 4 5 
!_• i • • I 

Fig. 1. This figure illustrates a tree data structure of points in 1 dimensions. In 
this construct, points spatially close, are also stored nearby. Tree structure is similar 
for continuous field, except there is no early termination of the tree (such as that 
of number 5 on the figure). 



Finding (joint) averages like (Si . . . Sjy) of a random field S, where the 
TV-points are taken at a particular configuration, can be done through sum- 
mation of TV-tuples. For the proper averages, the same algorithm is used for 
the data, and a random set describing the geometry of the survey. For the 
Monte Carlo estimators an ensemble of mixed TV-tuples need to be counted, 
but again the same algorithm applies without further complexity. The al- 
gorithm for TV = 2 is described next; higher orders are exactly analogous, 
although more tedious to describe. For details see [Jl] 
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The spatial configuration of the two-point function is characterized by the 
distance r between the two points: this should lie within a bin, i.e. b\ < r < 6 2 - 
An algorithm capable of calculating the sum Ylr 2 <b f° r an y ^ yields the 
answer for a proper bin b\ < r < 62 as a difference. 



Fig. 2. A particular state of the algorithm is represented. The two squares illustrate 
two nodes of the tree, which contain the points inside. The number of points is 
cashed, i.e. available at this level of the tree. The pairs of distances between points 
are marked with lines. If case i) or ii) is true (see text), none of these distances have 
to be checked, which in turn furnishes the speed-up of the algorithm. The smaller 
squares within the large square represent the case iii) when the tree has to be split; 
then the query is run again recursively. 

The data points or the pixels of the continuous field can be arranged in 
a tree data structure. At every level we store the number of elements (or 
values) below, i.e. sufficient cashed statistics. The sum needed then can be 
calculated by a recursive double search on the tree starting from the root. 
If the two nodes are ri\ and 712 with values <5i and ^2 let us denote with r 
any distance between points of n\ and n 2 (the lines between the squares of 
Figure [21 If we can prove that i) for any r > b — > return, or ii) for any r < b 
— > add Si x 62 to the sum and return, (we are done), else iii) split the tree 
and continue recursively. In worst case this procedure descends to the leaves 
of the tree, but typically huge savings are realized when whole chunks of data 
can be taken into account at coarser levels of the tree. 

Algorithms Based on Fourier Transforms 

As shown next, pair summation can be reformulated to make use of Fast 
Fourier Transforms, one of fastest algorithms in existence. If P{x) = ciq + 




Introduction to Higher Order Spatial Statistics in Cosmology 



17 




II) 1 10 1 
klhMpc- 1 ] 



Fig. 3. The power spectrum measured in the VLS (Virgo) simulations. The two- 
point correlation function have been measured with a fast hierarchical algorithm, 
and the edge corrected power spectrum was obtained from Eq. l3UI The errorbars are 
calculated from 8 sub-cubes from the simulation, and the two curves show the linear 
and non- linear theoretical power spectra [22] . The measurement of the correlation 
function with the Fourier algorithm is even faster, and produces identical results. 

a\x + . . . + a n —\ is a polynomial, and e = e l7X% l r unit roots, the coefficients 
of a discrete Fourier transform of the series can be defined as 

a k = P(e k ) =a + a x e k + ... + a n ^e^ k . (35) 

Direct calculation confirms that ife^fe+zi can be calculated by Fourier trans- 
forming the series a.i and 6^, multiplying the resulting afc&£, and finally inverse 
Fourier transforming back. This simple observation is the discrete analog of 
the Wiener-Khinchin theorem. To obtain correlation one has to work through 
subtleties involved with with the periodic boundary conditions, and multidi- 
mensionality. The final result is a probably fastest algorithm for calculating 
correlation functions. 

The algorithm in broad terms consists of i) placing the points in a suf- 
ficiently fine grid, storing the value iVk, the number of objects at (vector) 
grid point k, (this step is omitted if the density field is given, such as CMB 
maps), ii) calculating fluctuations of the field by 5 — (N — (N))/(N), iii) 
discrete Fourier transform with a fast FFT engine, iv) multiplying the co- 
efficients v) Fourier transform back. The same procedure is followed for the 
mask with zero padding large enough to avoid aliasing effects. The resulting 
inhomogeneous correlation function is a generalization of the one obtained in 
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the previous subsections; the usual (homogeneous) correlation function can 
be obtained by summing over spherical shells. Edge effect corrected power 
spectrum is obtained with yet another Fourier transform. Measurement of the 
correlation function on a 768 3 grid takes about 15 minutes on one Opteron 
processor (cf. Figure ??). 

Pure Fourier algorithms are not practical for edge corrected three-point 
statistics. However, it is possible to combine hierarchical and Fourier meth- 
ods, to obtain an algorithm which is faster than either of them. In Figure 
show measurements with such an algorithm in 8 realizations of 260 million 
particle VLS |5j simulations in a 239.5h~ 1 Mpc cube. The measurement us- 
ing ~ 1400 bins on a 100 3 grid (= 10 1 5 triangles) took less than three hours 
on a 2.4Ghz single CPU. 

r 1 =15.3378Mp<A"V 2 =ig.9792Mpc/r 1 



0.8 
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e 

Fig. 4. The three-point correlation function measured with a Fourier based al- 
gorithm. The data used is the same as Fig. [3] Theory (solid line) is calculated 
through Eq. 1391 Pure leading order perturbation theory is used, no smoothing ef- 
fects or non-linearities are taken into account, which probably account for small 
differences, besides cosmic variance. Note that errorbars, estimated from the same 
8 sub-cubes as in the previous figure, are highly correlated. 



6.2 Counts in cells algorithms 

I present four different algorithms for counts-in-cells, because each of them 
is optimal for different purposes: algorithm I. calculates counts in cells with 
infinite sampling for small scales, II. is a massively oversampling algorithm for 
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large scales, III. is a massively oversampling algorithm for intermediate/small 
scales, while IV. is a massively oversampling algorithm for large scales and 
arbitrary cell shapes/sizes. II. and IV have very similar performance, although 
II. is more straightforward to generalize for lensing and compensated filters, 
while IV. has more flexibility in terms of cell shape/size. Together I- IV. covers 
the full dynamic range of future surveys, with ample overlap between them 
for cross check. 

CIC I: sweep 




Fig. 5. Illustrates the geometric calculation of counts in cells. There are four points 
within the solid boundary. The centers of square cells can lie within the dashed 
boundary. Around each point a square is drawn to represent the possible centers 
of cells which contain that point. The problem of counts in cells can now be refor- 
mulated as calculation of the ratios of all overlap areas (represented with different 
shadings on the figure) within the dashed boundary. 
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This algorithm uses the well known "sweep" paradigm from computa- 
tional geometry, and it realizes the ideal case of C = oo number of "random" 
cells thrown. Here we summarize the basic idea in two dimensions only. 

The geometric interpretation of the probability of finding N galaxies in a 
randomly thrown cell is shown on Figure[S] There are four points in a rectan- 
gular box.sa Around each object (large dots) a square is drawn, identical to 
the sampling cell used for counts in cells. The possible centers of random cells 
all lie within the dashed line, which follows the boundary of the bounding 
box. Since the square around each point corresponds to the possible centers of 
(random) cells containing that same point, the question can be reformulated 
in the following way: let us partition the area of the possible centers of cells 
according to the overlap properties of the cells drawn around the objects. 
The ratio of the area with N overlaps to the total area corresponds to Pjy- 

Imagine a rigid vertical line moving slowly from the left of Figure |3J 
towards the right; the boundary can be ignored temporarily. Before the line 
touches any of the squares, it sweeps through an area contributing to Pq. 
Therefore at the point of first contact all the swept area contributes to Pq and 
can be recorded. After the contact the line is divided into segments sweeping 
through areas contributing to Pq and Pi respectively. The boundaries of these 
segments can be imagined as two markers on the line, corresponding to the 
upper and lower corner the square being touched. As the sweep continues, the 
results can be recorded at any contact with the side of a square during the 
movement of the line: the areas swept are assigned according to the markers 
on the line to different P/v's. This is done with a one dimensional sweep on the 
line counting the two kinds of markers. Then the segmentation of the line is 
updated. Whenever the line makes contact with the left side of a square, two 
markers are added, whenever it touches the right hand side of a square, the 
corresponding markers are dropped. The boundaries and rectangular masks, 
can be trivially taken into account by only starting to record the result of 
the sweep when entering the area of possible centers. Non-rectangular masks 
can be converted to rectangular by putting them on a grid. 

If there are N objects in the plane, the above procedure will finish after 2N 
updating. The algorithm can be trivially generalized for arbitrary rectangles, 
any dimensions. For instance in three dimensions the basic sweep is done 
with a plane, while the plane has to be swept by a line after each contact. 

From the definition of the algorithm it follows that the required CPU 
time scales as N D '{dj 'I / )- D (- D_1 )/ 2 in D = 2,3 dimensions, where N is the 
number of objects, d/L is the ratio of the scale of the measurement to the 
characteristic survey length. While for large scales and large D the algorithm 
becomes impractical, it is clear that for small scales it will be quite fast. It 
turns out that this is the regime where C — oo is the most important |18| . 
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Fig. 6. The successive convolution algorithm is illustrated in one dimension. The 
solid line represents the grid containing the data, while the dash line corresponds 
to auxiliary storage repeating the first 2 cells. At the 0th level data is placed in 
the grid, and counts in cells are calculated. At levels 1 and 2 the solid and dashed 
arrows respectively represent summations described in the text. 

CIC II: successive convolution 

This algorithm is essentially a Fourier (or renormalization) style convolution. 
It will be explained in one dimension for simplicity, generalization is obvious. 

The computations are performed on the largest possible grid with N seg- 
ments which can be fit into the memory of the computer: this determines 
the smallest possible scale L/N, where L is the box size, and N is the base 
sampling. A hierarchy of scales are used, with the scale at a given level being 
twice the scale at one level lower. The results one step lower in the hierarchy 
are used to keep the number of sampling cells constant even at the largest 
scales. Counts in cells can be straightforwardly calculated on the resolution 
scale of the grid, i.e. the smallest scale considered. For the calculation at twice 
the previous scale the sum of two cells are always stored in one of the cells, 
for instance in the one with smaller index. Because of the periodic boundary 
conditions, auxiliary storage is required to calculate the sum of the values 
in the rightmost cell (if the summations was done left to right), as its right 
neighbor is the leftmost cell which was overwritten in the first step. After 
these preparatory steps counts in cells can again be calculated from the N 
numbers representing partially overlapping cells. For the next level, twice the 
previous scale, one needs the sum of four original resolution cells: a calcu- 
lation simply done by summing every other cell of the previous results into 
one cell. At this level, two auxiliary storage spaces are needed because of the 
periodicity. In general, at each level in the hierarchy two cells of the previous 
results are summed as a preparatory step, and counts in cells are calculated 
simply by computing the frequency distribution of the N sums stored in the 
main grid. Auxiliary storage is needed for those rightmost cells, which have 
the periodic neighbors on the left end. 

In D dimensions 2 D cells are summed in the preparatory step, and the 
auxiliary storage space enlarges the original hypercube. The scaling of this 
algorithm is ClogC ~ GlogG with a preparation step linear in N. 
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CIC III: tree 

This alternative technique for small scales uses a tree data-structure, similar 
to the algorithm defined for the TV-point correlation functions; it is explained 
in three dimensions for convenience. 

The tree data structure can be thought of as an efficient representation 
of a sparse array, since at small scales most of the cells are empty in a grid 
spanning the data. The tree is built up recursively, by always dividing the 
particles into two groups based on which half of the volume they belong to. 
The same function is called on both halves with the corresponding particles 
until there is no particle in the volume, or the scale becomes smaller than a 
predetermined value. At each level the scale and the number of particles are 
known, and when an empty volume is reached, all contained volumes are also 
empty. These two observations are enough to insert the book-keeping needed 
to calculate counts in cells at all scales while the tree is built. The number 
of sampling cells at each level are 2 l , where I is the level; the original box is 
represented by I — 0. Towards smaller scales the number of cells increases. 
When N 3 = 2 l , where N is the size of the largest grid of the previous al- 
gorithm, the two techniques should (and do) give the exact same answers. 
At larger scales the previous algorithm is superior, since N > 2 l , while this 
algorithm becomes useful at smaller scales. Just as above, this procedure can 
be further improved by shifting the particles slightly before calculating the 
tree. However, since this hierarchy of grids has different numbers of cells, 
random shifts arc more advantageous. Shifting by a fraction of the smallest 
scale would not exhaust the possibilities for any larger scale, while shifting 
by a fraction of the largest grid might not shift the underlying grids at all. 
With the introduction of random shifts (ovcrsampling grids), the dynamic 
range of algorithms II and III will develop a substantial overlap, which will 
be useful for testing. This algorithm also scales as N log N (with preset depth 
limiting). 

CIC IV: cumulative grid 

Algorithm II. produces counts in cells results on scales 2 k l, where I is the scale 
associated with the base grid. For calculating counts-in-cells distribution for 
arbitrary scales [fa x I, fa x I], the following construction will be explained in 
two dimensions; generalization is obvious. 

Let us denote the value of the field riij at the grid-point Let us 

define another grid, with values c%j — X) P <i q <j n pq- Then the number of 
elements in a cell described by (i + k\,j + fa) can be calculated simply 

from Cij + Ci+fcjj+fcj — Ci+k t j — Cij+k 2 ( a we U known trick in computational 
geometry). The preprocessing is proportional to N, and counts in cells for 
arbitrary rectangle can be calculated is linear with C ~ G. 
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Fig. 7. In the cumulative grid algorithm, each grid point is replaced with a sum 
of all elements corresponding to rectangles on the figure, with (0,0) as the lower 
left, and the grid point as the upper right coordinate. The dark shaded area is the 
required sum, which is calculated as dj + c i+ kxj+k^ — c i+ k 1 j — Cij + k 2 (see text). 



7 Perturbation Theory 

On large scales, where the fluctuations are reasonably small, clustering of 
cosmic structures can be understood in terms of Eulerian weakly non-linear 
perturbation theory (PT). An excellent exposition is found 2 with substan- 
tial reference list. My goal next is to give an extremely cursory, recipe level 
introduction to the subject. 

PT predicts the behavior of any statistics from Gaussian initial condi- 
tions, assuming an expansion of the density field into first, second, etc. order, 
5 = 5^ +<5( 2 ) + . . .. This assumption, when substituted into Euler's equations 
for the cosmological dark matter "fluid" , yields a set of consistent equations 
for the different orders. The resulting perturbative expansion is most conve- 
niently expressed in Fourier space with kernels Fn as, 

S {N) (k) = J d\ 1 . . . d 3 q N F N ( qi , q N )6 D ( qi + . . .+q N -k)5 (1 \ qi ) . . . 6^(q N ) 

(36) 

and an analogous equation for the velocity potential. Euler's equations lead 
to a simple, albeit "combinatorially exploding" recursion for the kernels |2.3j . 
The most important kernel is the first non-trivial F-i 

i ? 2 (qi,q 2 ) = l + /i+(- + -)cos(0) + (l- A1 )cos(0) 2 , (37) 

92 91 
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where 9 is the angle between the two vectors (in this equation explicitly 
denoted by bold face), and fj, = f ^"V^o 

The leading order calculation of e.g., any third order statistic can be ob- 
tained simply by (<5 3 ) c = ((5^ + 5^ + .. .) 3 } c = 3(($W) 2 <f( 2 ))+bighar orders. 
For Gaussian initial conditions, this leads to formula in terms of P(k), the 
initial (or linear) power spectrum, and the F 2 kernel. 

When the above expansion is executed carefully PT predicts [23 E3 l^fil 
127) that the bispectrum in the weakly non-linear regime is 

{ it + P0{X) + (fe + fe) Pl{x) + \ (1 ~ ^ P2{X) ] P ( fc i) p ( fc 2)+Perm. 

(38) 

where x is the cosine of the angle between the two wave vectors, Pi are Leg- 
endre polynomials, and P(k) is the linear power spectrum. We have written 
the F 2 kernel in terms of Legendre polynomials, to make it explicitly clear 
that the first permutation depends only on terms up to quadrupole. 

Together with the formulae in section El this fact can be used for the 
prediction of the first permutaion of three-point correlation function in the 
weakly nonlinear regime: 

e (ri,r 2 ) = jS dk iS dk 2 (I + ^))P(ki)P(k 2 )jo(kir 1 )j a (k 2 r 2 ) 
£(ri,ra) = -!^dk x ^dk 2 (| + Pik^P^n^rMk^) 

£f(n,r 2 )= f^dk 1 ^ s dk2l(l-fi)P(k 1 )P(k 2 )j2(kir 1 )j 2 (k 2 r 2 ) (39) 

I emphasize that the above equation corresponds to the first permutation of 
the perturbation theory kernel: the other permutations are dealt with the 
same way, and the results are added. All the integrals factor into one dimen- 
sional ones, which simplifies the calculation in practice. It can be shown with 
tedious calculations that the above formulae are equivalent to [2H1- These 
equations has been used for the predictions of Figure ^] One has to be some- 
what careful to integrate the Bessel functions with sufficient accuracy. 

For completeness we present the analogous formulae for the projected 
three-point correlation function (see §9), which is integrated over line of sight 
coordinates to avoid the effects of redshift distortions. The perturbation the- 
ory kernel is simply rewritten in Fourier modes |2()j 

{ (I " ^) + (I + Y) COS{0) + COs{29) ] + Perm. 

^ (40 ) 

which yields the equivalent result for the projected three-point function £ p 

as 

Co(n,r 2 ) = / ^dh^dk 2 (I - ^))P(k 1 )P(k 2 )J (k 1 r 1 )J (k 2 r 2 ) 
Cf(ri,r 2 ) = - J ^dk^dk 2 (fi + P{k 1 )P(k 2 )Ji(k 1 r 1 )J 1 (k 2 r 2 ) 
Q 2 (ri,r 2 )^ J ^dk x ^dk 2 \{\- ii)P{k 1 )P{k 2 )J 2 {k l r 1 )J 2 {k 2 r 2 ) (41) 
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Here both and fcj are two-dimensional vectors perpendicular to the line of 
sight, and the other two permutations have to be added up as previously. 

8 Bias 

Most observervations record the distribution of "light" at some wavelength 
(except some still singular cases, such as neutrino astronomy, or air shower 
detectors), might have spatial distribution different from that of the underly- 
ing dark matter or "mass" . This difference means that our estimators might 
give us a "biased" view of the statistics of the underlying dark matter. This 
bias is caused by the complex interplay of non-linear gravitational dynamics 
with dissipative physics. While there have already been advances in ab ini- 
tio modeling of this complicated process, the most fruitful approach is still 
phcnomenological. 

Since galaxy formation is a stochastic process, we can imagine that there is 
no one-to-one correspondence between dark matter and galaxy density, the 
former only determines the probability of forming a galaxy (together with 
its environment and merger history). This general case is called stochastic 
biasing |291 13U). while, if there is a functional dependence between the two 
fields, it is deterministic. In mock catalogs created using semi-analytic galaxy 
formation models e.g., |31| . most of the stochasticity of the bias is simply due 
to shot noise |32| . 

If there is a functional relationship between the dark matter and galaxy 
density fields, the simplest possible relation is linear, the more general ones 
are non-linear. 

Linear bias is described by the equation of 5 g — bSoM, where b is a 
constant. Note that this is an inconsistent equation for b > 1 yielding the 
minimum value of the galaxy density field —6 < —1, obviously non-sense. 
Yet, owing to its simplicity, this is the single most used bias model. 

A non- linear generalization contains an arbitrary function S g = /(<5dm)- 
This function can either be assumed to be a general function with a few 
parameters (e.g. exponential), or can be expanded. The two most used ex- 
pansions are Taylor expansion, or expansion into Hermite-polynomials (and 
thereby expanding around a Gaussian). 

In the weakly non-linear regime, the coefficients of the Taylor expansion 
are the non-linear bias coefficients; 

S g =.f(6 DM ) = J2% S DM- (42) 

N 

This equation can be used perturbatively to calculate the moments of the 
biased (galaxy) density field in terms of the moments of the underlying field, 
and bias coefficients [331 [35] . In typical applications, the functional re- 
lationship is set up between smoothed fields: it is important to note that 
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smoothing and biasing are two operations which do not commute. There- 
fore the bias coefficients have a somewhat obscure physical meaning, as they 
depend not only on the physics of galaxy formation, but on the nature of 
smoothing itself. Note also, that the zeroth coefficient bo is set to ensure 
(5g) = 0, i.e. it is non-zero. 

Symmetries can be used used to construct estimators which constrain the 
bias in the weakly non-linear regime. To second order, the biased reduced 
bispectrum transforms as q = Q/b+ bijb 2 jSE], where the lower case denotes 
the galaxy (measured), and the upper case the dark matter (theory) values. 
It is clear that 62 can only effect the monopole term. Thus a simple estimator 
for the bias can be constructed as ^1 

b _ Qi _ Q2 

11 92 

b 2 = q a b 2 - Q b. (43) 

According to the equations, the quadrupole to dipole ratio does not depend 
on the bias, thus it serves as a novel, useful test of the underlying assumptions: 
a quasi-local perturbative, deterministic bias model and perturbation theory. 
Figure 4. shows the dipole to quadrupole ratio for BBKS, and EH power 
spectra, respectively. The range of fc's to be used for bias extraction can be 
determined from contrasting the measurements with these predictions. Note 
that scales where baryon oscillations are prominent are barely accessible with 
present redshift surveys. On smaller scales non-linear evolution is likely to 
modify these prediction based purely on leading order perturbation theory 

(E2- 
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Fig. 8. Left: the dipole to quadrupole ratio 3/5Q1/Q2 is plotted for a BBKS theory 
(thick solid line) for k = 0.01. The right panel shows the same using the EH fit for 
comparison, featuring baryonic oscillations. 

The above simple theory illustrates that three-point statistics can con- 
strain the first two bias parameters in the weakly non-linear regime. Similarly 
four point statistics constrain the first three parameters, etc. 

An alternative to the above simple theory, and possibly more useful on 
smaller scales, is the halo model. The physical picture is simple [38]: we try 
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to model the distribution by some entities, "halos" , which have a relatively 
simple distribution (from perturbation theory). Large scale correlations thus 
follow from the "halo-halo" type terms, while all the small scale statistics 
follows from the halo profile ("1-halo term"), and the distribution of halo 
masses. In these notes we cannot give justice to halo models, the interested 
reader is referred to JH|, and references therein. Halo models have considerable 
success in describing the two-point statistics of the dark matter distributions, 
and they provide a reasonable approximation (currently at the 20% level) to 
the three-point statistics. Galaxy bias is then described by the halo occupa- 
tion probabilities. 

For halo modell predictions, higher order Bessel integrals are needed |3T)] . 
but the principle is the exactly the same as Eq. 123 (see also 01]] for a different 
approach). While halo models are physically motivated, the best fit parame- 
ters differ for two and three-point statistics thus the physical meaning of the 
parameters is still somewhat nebulous. 

9 Redshift Distortions 

Redshift distortions, together with biasing, represent the most uncertain as- 
pect of the phenomenology of higher order statistics. Here I only offer a 
superficial overview of this complicated topic; for a comprehensive review of 
redshift distortions on two-point statistics, see 

The physical idea is fairly simple: in redshift surveys, the radial distance 
to an object is deduced from its redshift using the Hubble relation in our 
expanding universe. Therefore, the "peculiar velocity", the velocity of an 
object with respect the the average Hubble flow, will add to the measured 
distance. Such a space, where position of an object is given in a spherical 
coordinate system with two angles, and a radial distance which contains (a 
random) velocity is called redshift space, as opposed to real space (the one 
we would really want to measure). 

The spatial distribution of objects will no longer be translation invariant, 
although it will still have rotational symmetry around the observer. The 
deviation from translation invariance is of the observed distribution is called 
"redshift distortion". Qualitatively, it is mainly due to two distinct effects: 
on small scales, velocity dispersion of virialized clumps will look elongated 
along the line of sight: popularly called the "finger of god" effect. On larger 
scales, infall velocities cause distortions perpendicularly to the line of sight, 
the Kaiser-effect. 

The breaking of translational symmetry means that the redshift-space 
quantities depend on larger number of parameters, e.g. the two- and three- 
point correlation functions will depend on 3 and 6 parameters respectively 
(2, and 5 in the distant observer approximation). 

Redshift distortions can be taken into account in perturbation theory. In 
the distant observer approximation, when all lines of sights can be take to be 
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parallel, the effect has been calculated for the three-point function by |42|. 
The general case needs a complex multipolar expansions; the results are fairly 
tedious and will be presented elsewhere. 

Besides predicting the redshift space quantities using theory or simula- 
tion, on can sweep redshift distortions under the rug by introducing the 
"projected" correlation functions. These assume a distant observer approxi- 
mation, where one can consider the iV-vectors to pointing from the observer 
to N vertices of an iV-point configuration parallel. Then, one can introduce 
7r — (t coordinates with 7Tj vectors parallel to the line of sight, while the 
<7j vectors perpendicular to it. 

The important point about this parametrization is that, approximately, 
only 7r coordinates are affected by redshift distortions. A convenient, redshift 
distortion free quantity is obtained by integrating over the redshift direction, 
i.e. 7r coordinates. The resulting object, the "projected Appoint correlation 
function", integrated over N — 1 h coordinates is free of redshift distortions. 
Although it is similar to angular (projected) correlation function, the units of 
the a coordinates are still h _1 Mpc, and, if properly done, its signal to noise 
is higher. 

Fourier space analogs of the above idea use k± and fey for perpendicular 
and parallel k vectors. E.g., the redshift space bispectrum is parametrized 
by the five parameters B(ki ! Xi ^3,Xj k\,\\-> &2,l|)) with _L denoting trans- 
verse, and || parallel quantities with respect to the line of sight in the distant 
observer approximation. Interestingly, the real space bispectrum can be esti- 
mated from taking k% t \\ ~ k 2 _\\ ~ [B?j . 

10 Example: Conditional Cumulants 

In my lectures, I presented an detailed example using a new set of statistics, 
conditional cumulants, closely following It illustrates most of the generic 
features of the theory, baring its strengths and weaknesses. 

Conditional cumulants represent an interesting and sensible compromise 
between A-point correlation functions and cumulants measured from mo- 
ments of counts in cells. As we will see next, they can be understood as 
degenerate A-point correlation functions, or integrated monopole moments 
of the bispectrum, and they are closely related to neighbor counts. They share 
accurate edge corrected estimators with A-point correlation functions, yet, 
they are as straightforward to measure and interpret as counts in cells. 

10.1 Basics 

The general conditional cumulants, Eq.|^l have been defined as the joint con- 
nected moment of one unsmoothed and A — 1 smoothed density fluctuation 
fields. In the most general case, each top hat window might have a different 
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radius. Further simplification arises if all the top hats are the same, i.e. we 
define Ufq{r) with n = ... = rjv_i = r as the degenerate conditional cumu- 
lant c.f. C/jv subtly differs from the usual cumulant of smoothed field £ N 
by one less integral over the window function. 

The second order, 1/2, is equivalent to the (confusingly named) J3 integral 
e.g., 0, 

U 2 (r) = ^J 3 (r) = J P(k)w(kr)A7rk 2 dk , (44) 

where w{kr) — 3(sinfcr — kr cos kr) / (kr) 3 is the Fourier transform of W r (s), 
and P(k) is the power spectrum. 

For higher orders, we can construct reduced conditional cumulants as 

Both Un and Rn have deep connection with moments of neighbor counts 
e.g., [3] as we show next. Let us define the partition function Z[J] = 
(exp(/ iJp)) c.f., 0], where p is the smooth density field. Then we can use 
the special source function iJ{x) = W(x)s + Sr>(x)t to obtain the generating 
function G(s, t). This is related to the generating function of neighbor counts 
factorial moments as G(s) = dtG(s,t)\t=o- Explicitly, 

„, . \ -» (snv) \ ^ (snv) N — 

G ( fi ) = E HIT Um+1 6XP E -J^~^ , (46) 

M>0 ' JV>1 

where nv — N is the average count of galaxies, and £ = Ui = 1 by definition. 
This generating function can be used to obtain C//v's and/or i?/v's from neigh- 
bor count factorial moments analogously as the generating functions in |45j 
for obtaining Sat's from factorial moments of counts in cells. For complete- 
ness, the generating function for neighbor counts' distribution is obtained 
by substituting s — * s — 1, while the ordinary moment generating function 
by s — > e s — 1. Expanding G(e s — 1) recovers the formulae in [3], §36. The 
above generating function allows the extraction of Un from neighbor count 
statistics for high N. The situation is analogous to the CIC theory presented 
in |45) . and it is fully applicable to neighbor counts statistics with minor and 
trivial modifications. 

So far our discussion has been entirely general; in what follows we will 
focus on N — 3, i.e. the first non-trivial conditional cumulant U3. ^(fi,^) 
is directly related to bispectrum by 

U 3 (ri,r 2 ) = f 5 ( k i J k 2 J k 3 )fe(k 1 + k 2 +k 3 ) 

w(k 1 r 1 )w(k2r2)d 3 k 1 d 3 k 2 d 3 k3 , (47) 

where Sd is the Dirac delta function. To further elucidate this formula, we use 
the multipole expansion of bispectrum and three point correlation function 
of Eqs.EIlandE21to find 
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U 3 (n,r 2 ) = TfetftfCain^titidridrz (48) 

= I dk 1 dk 2 ^^-ji(k 1 r 1 )j 1 (k 2 r2)B (k 1 ,k 2 ) , (49) 

in which j\ is the first order spherical Bessel function. We see the U3 de- 
pends only on the monopole moment of the bispectrum/three-point correla- 
tion function. This property significantly simplifies the transformation of this 
statistic under redshift distortions. 



10.2 C/3 in the Weakly Non-linear Perturbation Regime 

Using the general machinery of perturbation theory, one can predict the con- 
ditional cumulants to leading order as matter of simple calculation. For third 
order, the results, as usual, will depend on the F 2 kernel, and the linear power 
spectrum (or correlation function). We leave as an exercise for the reader to 
show that 



#3(ri,r 2 ) 



U 2 (r 1 )U 2 (r 2 ) 



'11 

21 



1 



U r l, r 2 ) 

U 2 (r t ) 



Ox 1^2) 
U 2 (r 2 ) 



1 

+ 3 
1 

+ 3 



£(ri,r 2 ) 
U 2 ( ri ) 

U 2 (r 2 ) 



d\nU 2 (r 2 ) 
d In r 2 

dlnU 2 ( ri ) _ 
din ri 



d In g(n ,r 2 ) 
d In r 2 

d In g(ri,r 2 ) 
d In ri 



(50) 



in which £(ri,r 2 ) — 2^2 J k 2 P(k)W(kr 1 )W(kr2)dk. The special case when 
n = T2 = r reads 
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where a 2 = J k 2 P(k)W 2 (kr)dk. Note the similarity of R 3 with the 
skewness, which is calculated in weakly non-linear perturbation theory as 
S3 = 34/7-dlna 2 /dlnr H3E|. 



10.3 Measurements of Conditional Cumulants in Simulations 

U n {r) can be measured similarly to iV-point correlation functions. For in- 
stance U2 can be thought of as a two point correlation function in a bin 
[^io,Thi] = [0, r]. Taking the lower limit to be a small number instead of 0, 
one can avoid discreteness effects due to self counting (this is equivalent to 
using factorial moments when neighbor counts are calculated directly). Given 
a set of data and random points, the class of estimators ofll7lprovides an edge 
(and incompleteness) corrected technique to measure conditional cumulants. 
Existing iV-point correlation function codes can be used for the estimation; 
for higher then third order, one also has to take connected moments in the 
usual way. 
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Fig. 9. Predictions of weakly non-linear perturbation theory for third order con- 
ditional cumulant U3 (left), and the reduced statistics R3 (right) are in real space 
(solid line) are compared with measurements in N-body simulations (triangles with 
errorbars) with ylCDM cosmology. The agreement is remarkable above :> 20h _1 Mpc 

While the above suggests a scaling similar to TV-point correlation func- 
tions, the relation to neighbor count factorial moments outlined in the previ- 
ous section can be used to realize the estimator using two-point correlation 
function codes. To develop such an estimator, neighbor count factorial mo- 
ments need to be collected for each possible combinations, where data and 
random points play the role of center and neighbor. 

Note that the edge correction of Eq. (fT7|l is expected to be less accurate 
for conditional cumulants, than for iV-point correlation functions, however, 
the estimator will be more accurate than CIC estimators. Several alternative 
ways for correcting edge effects are known, which would be directly applicable 
to conditional cumulants 0H1EHHH31- In what follows, we use Eq. (|T7|) for all 
results presented. 

To test PT of the conditional cumulants, we performed measurements in 
ylCDM simulations by the Virgo Supercomputing Consortium jSJ. We used 
outputs of the Virgo simulation and the VLS (Very Large Simulation) . Except 
for box sizes and number of particles, these two simulations have identical 
cosmology parameters: J? m = 0.3, f2 v = 0.7, r = 0.21, h — 0.7 and as = 0.9. 
In order to estimate measurement errors, we divide the VLS simulation into 
eight independent subsets each with the same size and geometry the original 
Virgo simulation. In total, we have used the resulting nine realizations to 
estimate errors. Note that we corrected for cosmic bias by always taking the 
average before ratio statistics were formed. 

Our measurements of the second and third order conditional cumulants 
are displayed in Figs. 1 and 2, respectively. Results from EPT (Eq. [3] are 
denoted with solid lines. The measurements in simulations are in excellent 
agreement with EPT, especially on on large scales > 20h _1 Mpc. 
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10.4 Redshift Distortions 

As three-dimensional galaxy catalogs are produced inherently in redshift 
space, understanding effects of redshift distortions on these statistics is crucial 
before practical applications can follow. In the distant observer approxima- 
tion, the formula by |52II53| is expected to provide an excellent approximation 
for Uz(r). According to §2, we only need to consider the monopole enhance- 
ment 



where / « fi^f. This formula essentially predicts a uniform shift of the real 
space results. To test it, we repeated our measurements in redshift space, and 
found that the above is indeed an excellent approximation in redshift space. 

Considering the relatively simple, monopole nature of the statistics, we 
expect that the overall effect on U3 should also be a simple shift, similarly 
to the Lagrangian calculations by and the more general Eulerian results 
by 021 • Specifically, we propose that ratio of R3 in redshift space to that in 
real space can be approximated by 



This is motivated by the notion that the shift from redshift distortions of equi- 
lateral triangles should be similar to the corresponding shift for our monopole 
statistic. Our simulations results (see Fig. 3) show that this simple idea is 
indeed a surprisingly good approximation, although the phenomenological 
theory based on the above formula appears to have ~ 5% bias on scales 
;> 20h _1 Mpc where we expect that weakly non- linear perturbation theory is 
a good approximation. For practical applications, this bias can be calibrated 
by iV-body, or 2LPT |M] simulations. 

In addition to the above simple formula, we have calculated the shift due 
to redshift distortions by angular averaging the bispectrum monopole term 
in 021- We have found that the results over-predict redshift distortions, how- 
ever, they would agree with simulations at the 1-2% level if we halved the 
terms classified as FOG (finger of god). At the moment there is no justifica- 
tion for such a fudge factor, therefore we opt to use the above phenomenology, 
which is about 5% accurate. While redshift distortions of third order statis- 
tics are still not fully understood due to the non-perturbative nature of the 
redshift space mapping (R. Scoccimarro, private communication), detailed 
calculations taking into account velocity dispersion effects will improve the 
accuracy of the redshift space theory U3. 

For applications to constrain bias, one has to keep in mind that redshift 
distortions and non-linear bias do not commute. However, at the level of 
the above simple theory, it is clear that one can understand the important 
effects at least for the third order statistic. There are several ways to apply 
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Fig. 10. Same as Figure 2, lower panel, but for R% in redshift space. The solid line 
shows phenomenological model based on Equation 1531 The theory appears to be a 
reasonable approximation at the 5% level. 

conditional cumulants for bias determination, either via combination with 
another statistic |55j . or using the configuration dependence of the more 
general i?3(ri,r 2 ). One also has to be careful that in practical applications 
ratio statistics will contain cosmic bias [SB]. We propose that joint estimation 
with U2 and U3 will be more fruitful, even if i?3 is better for plotting purposes. 
Details of the techniques to constrain bias from these statistics, as well we 
determination of the bias from wide field redshift surveys is left for future 
work. 

Another way to get around redshift distortions is to adapt conditional cu- 
mulants for projected and angular quantities. Such calculations are straight- 
forward, and entirely analogous to those performed for S3 in the past. Further 
possible generalization of our theory would be to use halo models jS] to ex- 
tend the range of applicability of the theory well below 20h _1 Mpc. These 
generalizations are left for subsequent research. 
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